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In today's world, networks seem to appear everywhere. There are social 
networks, communication networks, financial transaction networks, gene reg- 
ulatory networks, disease transmission networks, ecological food networks, 
mobile telephone and sensor networks and more. We, our professional col- 
leagues, our friends and family, and especially our students, are often part 
of online networks such as Facebook, Linkedln and now Google Buzz. Some 
network structures are static and others are dynamically evolving. Networks 
are usually represented in terms of graphs with the nodes representing en- 
tities, for example, people, and the edges representing ties or relationships. 
Edges may be directed or undirected depending on the application and sub- 
stantive question of interest. In terms of statistical science, a network model 
is one that accounts for the structure of the network ties in terms of the 
probability that each network tie exists, whether conditional on all other 
ties, or as considered part of the distribution of the ensemble of ties. 

Ideas and language from graph theory abound in the technical liter- 
ature on networks. A typical representation involves a network with iV 

nodes, having ( ^1 unordered pairs of nodes, and hence 2 possible 

directed edges. If the labels on edges reflect the nodes they link, as 
Yij represents the existence of an edge from individual i to j, and {Y} = 
{Yi2, Y13, . . . , Yr N _i) N } represents the ties in the graph. The simplest net- 
work models assume the edges to be independent, while a statistically more 
interesting class of models treats the dyadic structures for pairs of nodes to 
be independent. 

In an extensive review of the statistical literature on network modeling, 
Goldenberg et al. (2010) note: 

Almost all of the "statistically" oriented literature on the analysis of networks 
derives from a handful of seminal papers. In social psychology and sociology 
there is the early work of Simmel (1950) at the turn of the last century and 



Received March 2010. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Applied Statistics. 
2010, Vol. 4, No. 1, 1-4. This reprint differs from the original in pagination and 
typographic detail. 



1 



2 



S. E. FIENBERG 



Moreno (1934) in the 1930s, as well as the empirical studies of Milgram (1967) 
and Travers and Milgram (1969) in the 1960s; in mathematics/probability 
there is the Erdos-Renyi work on random graph models [Erdos and Renyi 
(1959, 1960), and a closely related Annals of Mathematical Statistics paper by 
Gilbert (1959)]. There are of course other papers that dealt with these topics 
contemporaneously or even earlier. But these are the ones that appear to have 
had lasting impact. 

Statistical work in the late 1970s and early 1980s emphasized models that 
exploited dyadic independence, for example, in the work of Holland and Lein- 
hardt (1981). More complex exponential random graph models (ERGMs) 
then drew considerable attention; for example, see Frank and Strauss (1986). 
But the estimation of parameters for such models turns out to have been 
more problematic than expected; for example, see the discussion in Rinaldo, 
Fienberg and Zhou (2009). 

The network modeling literature has "taken off" in the past decade, in 
part because of the interest in structures associated with the internet, and 
there are contributors from many different disciplines, including biology, 
computer science, statistical physics, sociology and, of course, statistics. Ko- 
lacyzk (2009) provides a book length treatment of a selection of approaches 
and Airoldi et al. (2007) provides a compilation of relevant papers. In ad- 
dition there is the probabilistic literature that has derived from the Erdos- 
Renyi-Gilbert formulations much of which is described in Chung and Lu 
(2006) and Durrett (2006). 

Methods for the analysis of network data now take at least as many 
forms as the applications in which they arise. While the original examples of 
networks analyzed in the literature were typically small (e.g., n = 18 nodes 
corresponding to monks in a monastery), the size of networks analyzed with 
more modern methodology has grown exponentially. Networks with 1000 
nodes are common, for example, in the study of protein-protein interaction, 
and online networks such as Facebook include hundreds of million nodes. An 
interesting statistical question we can ask is whether there is a relevant 
asymptotics associated with network models as we move into such high 
dimensions. A recent paper by Bickel and Chen (2009) opens the door to 
such important statistical issues by linking back to ideas in the probabilistic 
network literature. 

The response to our initial call for papers on the topic of network modeling 
was so overwhelming that we are dividing the special section into two parts, 
with the first appearing in this issue of The Annals of Applied Statistics 
(Volume 4, No. 1), and the remainder in the next issue (Volume 4, No. 2). 

In Part I of this special section, we include a diverse collection of papers 
with applications spanning sampling of rare populations, internet flows, gene 
networks, online e-loyalty networks, document-as-nodes links induced from 
text, and more. The methodologies begin with ERGMs but include sparse 
regression models and state space models. 
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In Modeling Social Networks from Sampled Data, Handcock and Gile de- 
velop the conceptual and computational statistical framework for likeli- 
hood inference for ERGMs based on sampled network information, espe- 
cially for data from adaptive network designs. They motivate and illustrate 
these ideas by analyzing the effect of link-tracing sampling designs on the 
collaborative working relations between 36 partners in a New England law 
firm. 

In Analysis of Dependence Among Size, Rate and Duration of Internet 
Flows, Park, Hernandez- Campos, Marron, Jeffay and Smith use Pearson's 
correlation coefficient and extremal dependence analysis to study the flows 
of packet traces from three internet networks. The correlations between 
size and duration turn out to be much smaller than one might expect 
and can be strongly affected by applying thresholds to size or duration. 
Using extremal dependence analysis, they draw a similar conclusion, that 
is, near independence for extremal values of size and rate. 
Peng, Zhu, Han, Noh, Pollack and Wang work with sparse regression ap- 
proaches in Regularized Multivariate Regression for Identifying Master 
Predictors with Application to Integrative Genomics Study of Breast Can- 
cer. They apply their methods to genome wide RNA transcript levels and 
DNA copy numbers were measured for 172 tumor samples. 
In Optimal Experiment Design in a Filtering Context with Application 
to Sampled Network Data, Singhal and Michailidis examine the problem 
of optimal design in the context of filtering multiple random walks on 
networks. They apply their methodology to tracking network flow volumes 
using sampled data where the design variable corresponds to controlling 
the sampling rate, and they relate their approach to the steady state 
optimal design for state space models. 

Political networks and gene regulatory networks are the primary focus 
of application in Estimating Time-Varying Networks by Kolar, Song, 
Ahmed and Xing. They describe an approach that builds on a tempo- 
rally smoothed ^-regularized logistic regression formalism that can be 
cast as standard convex-optimization problem and solved efficiently using 
generic solvers scalable to large networks. 

Working with scientific citation networks, hyperlinked web pages and ge- 
ographically tagged news articles, Chang and Blei develop a Hierarchi- 
cal Relational Model of Document Networks. They develop a hierarchical 
model of both network structure where the attributes of each document 
are its words, and for each pair of documents, the model is their link as a 
binary random variable that is conditioned on their contents. They derive 
efficient inference and estimation algorithms based on variational methods 
that take advantage of sparsity and scale with the number of links. 
Jank and Yahav focus on a dataset involving 30,000 auctions from one 
of the main consumer-to-consumer online auction houses. They propose a 
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novel measure of e-loyalty via the associated network of transactions be- 
tween bidders and sellers. In E-Loyalty Networks in Online Auctions, they 
employ ideas from functional principal component analysis to derive, from 
this network, the distribution of perceived loyalty of every individual seller 
and associated loyalty scores. In the process, they confront the clustering 
feature of loyalty networks, with a few high- volume sellers accounting for 
most of the individual transactions. 

Part II of this special section will explore another diverse collection of net- 
work models and applications. 
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